Spatio-temporal Speech Enhancement for Robust Speech Recognition

نویسندگان

Erik Visser

Manabu Otsuka

Te-Won Lee

چکیده

A new speech enhancement scheme is presented integrating spatial and temporal signal processing methods for robust speech recognition in noisy environments. The scheme first separates spatially localized point sources from noisy speech signals recorded by two microphones. Blind source separation algorithms assuming no a priori knowledge about the sources involved are applied in this spatial processing stage. Then denoising of distributed background noise is achieved in a combined spatial/temporal processing approach. The desired speaker signal is first processed along with an artificially constructed noise signal in a supplementary blind source separation step. It is further denoised by exploiting differences in temporal speech and noise statistics in a wavelet filterbank. The scheme’s performance is illustrated by speech recognition experiments on real recordings in a noisy car environment and compared to conventional techniques like beamforming and spectral subtraction.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A spatio-temporal speech enhancement scheme for robust speech recognition in noisy environments

متن کامل

A spatio-temporal speech enhancement scheme for robust speech recognition

متن کامل

An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition

Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...

متن کامل

Improving the performance of MFCC for Persian robust speech recognition

The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...

متن کامل

Combined Temporal and Spectral Processing Methods for Speech Enhancement

Speech signals from the uncontrolled environments may contain degradation components along with the required speech components. The degradation components include background noise, reverberation and speech from other speakers. The degraded speech gives poor performance in automatic speech processing tasks like speech recognition and speaker recognition and is also uncomfortable for human listen...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

Spatio-temporal Speech Enhancement for Robust Speech Recognition

نویسندگان

چکیده

منابع مشابه

A spatio-temporal speech enhancement scheme for robust speech recognition in noisy environments

A spatio-temporal speech enhancement scheme for robust speech recognition

An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition

Improving the performance of MFCC for Persian robust speech recognition

Combined Temporal and Spectral Processing Methods for Speech Enhancement

عنوان ژورنال:

اشتراک گذاری